Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additions to agent protocol #717

Merged
merged 25 commits into from
Jun 27, 2024
Merged

Additions to agent protocol #717

merged 25 commits into from
Jun 27, 2024

Conversation

biglittlebigben
Copy link
Contributor

@biglittlebigben biglittlebigben commented May 16, 2024

This PR:

  • Adds StartAgentJobRequest/StopAgentJobRequest calls to start an handle manually mange an agent lifecycle
  • Deprecates the JobType field in WorkerInfo as workers will now be expected to handle both kinds jobs (if the namespace supports both kinds in the first place)
  • Adds a metadata string -> string map to Job to allow agent implementation to take anonymous (to us) parameters
  • Adds a JobStatus and error fields to Job for status reporting (Similar to egress and ingress)
  • Makes livekit_agent.proto indentation consistent with other proto files
  • Add room configuration and management APIs to the Room service. Rooms configuration allow bundling most room configuration parameters under a single id, that can eventually be put as part of the room creation token.

Copy link

changeset-bot bot commented May 16, 2024

⚠️ No Changeset found

Latest commit: d82084e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

💥 An error occurred when fetching the changed packages and changesets in this PR
Some errors occurred when validating the changesets config:
The package or glob expression "github.com/livekit/protocol" specified in the `fixed` option does not match any package in the project. You may have misspelled the package name or provided an invalid glob expression. Note that glob expressions must be defined according to https://www.npmjs.com/package/micromatch.

Copy link
Contributor

@paulwe paulwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🚀

string participant_metadata = 6;
string participant_name = 4;
string participant_identity = 5;
string participant_metadata = 6;
}

message UpdateJobStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't implemented in the agents client... https://github.com/search?q=org%3Alivekit%20UpdateJobStatus&type=code

since every job will have a participant connection we could maybe remove this and use connection status?

  • if a participant is connected the job is ok
  • if the client disconnects uncleanly it was in error and we should reassign it
  • if the client disconnects cleanly the job is done and completed successfully

Copy link
Member

@theomonnom theomonnom May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be OK, but we may want to implement Job metadata updates in the future (not a priority for now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job metadata is immutable. the agent can use participant metadata or its own external storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stepping back: what information is conveyed in the participant* fields? Aren't the participants identified by the Job? If not, what participant is this referring to?

Copy link
Contributor

@paulwe paulwe May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was referring only to UpdateJobStatus - gh included irrelevant lines. to answer your question though - the agent service sends an availability request to the client. if the client is available to handle the job the response includes participant info used to create a token. the token is then passed back to the client in the job assignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. This PR deprecates the field for now. from the above, it looks like I should rather remove it entirely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably clean it up in lk first - the service implements it - but yeah, i think so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Next step is to implement this in lk anyway. Probably makes sense to do it before I merge this.

message StartAgentJobRequest {
JobType type = 1;
Room room = 2;
optional ParticipantInfo participant = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably only need publisher identity here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for just the room name mb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Room room = 2;
optional ParticipantInfo participant = 3;
string namespace = 4;
map<string, string> metadata = 5;
Copy link
Member

@theomonnom theomonnom May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be a string? (let user serialize stuff if they want)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently use a map for egress, a string for other entities (participant, etc). Either has pros and cons. The map lets us potentially stash fields in there as well using a reserved namespace. The string is more flexible...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in cases where we might be tempted to extend the user supplied metadata with values from the service we should probably add structured fields. for example we might include an optional field for sip data with a struct containing phone #, trunk id, etc...

@@ -266,7 +266,8 @@ message SipDTMF {
}

message Transcription {
string participant_identity = 2;
// Participant that got its speech transcribed
string transcribed_participant_identity = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems redundant to me? everything in a Transcription should be scoped to the transcription.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a part of #706 probably. It was confusing to see two participant_identities in data packets in SDK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that.. would comments in the protobuf help to clear that up? Just looking at a Transcription object (without context on what is using the message), it seems to break some naming patterns to have transcribed_ again

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe source_ then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sgtm!

@@ -59,11 +60,22 @@ service RoomService {

// Update room metadata, will cause updates to be broadcasted to everyone in the room, Requires `roomAdmin`
rpc UpdateRoomMetadata (UpdateRoomMetadataRequest) returns (Room);

// Create a room configuration.
rpc CreateRoomConfiguration(CreateRoomConfigurationRequest) returns (RoomConfiguration);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth discussing: should these be APIs on room service? or Project settings in Cloud (and config in OSS)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they are on room, we'd have to figure out permissions.. what token permissions would they require

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "configuration service" would work too. I have no strong opinion either way. Does anybody else have any input? Either way, we'll need to sort out permissions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose not to have a service for this initially. OSS would configure these in the config yaml. Cloud would store it per project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would cloud allow defining these configurations?

Comment on lines 169 to 170
// The load that the job currently has on the worker
float load = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as convenient as it would be to have this does the agent client have any way to give us job load info? isn't most of the work done in a single process? @theomonnom ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Egress runs each worker in a separate process and reports load for each ingress job separately, but that took a not insignificant amount of effort to implement and may be a too high burden to put on all agent implementations. I agree we should delete it unless @theomonnom has some further input on how this would be used?

Comment on lines 48 to 53
JobStatus status = 7;
string error = 8;
float load = 9;
int64 started_at = 10;
int64 ended_at = 11;
int64 updated_at = 12;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure we want any of these on Job - this is the job description passed to the worker to initialize a job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mixes Job Model and Execution status in the same message indeed. This is the pattern that is also used by Egress. Most consumers of the Job object need its current or latest state as well, which makes it convenient to have a message that makes it easy to send them together.

The IngressInfo message attempts to separate the Model and State somehow by packaging all the state related entries into a an IngressState message that is in turns included inside the Ingress message. The IngressState entry is null when the Ingress is used purely as model (which only happens at initialization).

An laternative here would be to flip the Job and JobState relationship compared to ingress:

message JobState {
   JobStatus status = 1;
   string error = 2;
   float load = 3;
   int64 started_at = 4;
   int64 ended_at = 5;
   int64 updated_at = 6;
  Job job = 7; 
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes the most sense to me because the job config is an immutable property of the running process but for the sake of consistency we could compromise and copy ingress so they're at least separate objects

Comment on lines 30 to 36
message StartAgentJobRequest {
JobType type = 1;
string room_name = 2;
optional string participant_identity = 3;
string namespace = 4;
string metadata = 5;
}
Copy link
Contributor

@paulwe paulwe May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current implementation publisher jobs apply to all participants once they begin publishing. this doesn't fit into this api. multiple job ids could result from a single call and more jobs could start after the initial call.

i think we've accidentally overloaded the Job concept - in the original design this represented an invocation of the agent worker. what we're creating with this api call is a rule for dispatching jobs ie start a job for the room or start a job for publishers or for one publisher with the identity x.

Copy link
Member

@davidzhao davidzhao May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just StartAgent / StartAgentRequest ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have 2 options:

  • Change the Job definition to represent 1 invocation for either the whole room, or 1 participant
  • Keep the current Job definition and remove the participant_identity argument in the Start message (and rename the rpc indeed).

Is there a use case for starting an Agent Job for a single participant? At first glance, it seems so in context where there is a main speaker. Short of doing that, we may eventually need some kind of filter argument to tell a 3rd party agent what participants to process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current (before this PR) Job definition already has an optional ParticipantInfo field. How are agents currently expected to use it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll want the option to specify one or more publishers a job should start for. we don't want the user to have to specify the identities of every publisher in the room if they want the job to start automatically for publishers though. i think we need a new concept to describe what this api operates on - i don't have a great name for it but WorkOrder fits the job/worker metaphor.

the publisher's ParticipantInfo is stored in this field by the agents framework when dispatching a publisher type job.

@paulwe
Copy link
Contributor

paulwe commented Jun 26, 2024

can we add a field for the number of jobs active during the period the load sample is taken from to the UpdateWorkerStatus message.

what is the process for associating running jobs with a new worker connection when the websocket reconnects? should we expect a flood of MigrateJobRequest messages? maybe that message type should support migrating multiple jobs at once?

@biglittlebigben
Copy link
Contributor Author

can we add a field for the number of jobs active during the period the load sample is taken from to the UpdateWorkerStatus message.

Added. We have to define what the active Jon count is though: the count at capture time, the max since last capture, ...?

what is the process for associating running jobs with a new worker connection when the websocket reconnects? should we expect a flood of MigrateJobRequest messages? maybe that message type should support migrating multiple jobs at once?

We have not implemented this anywhere yet AFAICT. I did change the message to use a list of ids though.

@biglittlebigben biglittlebigben merged commit 82786f4 into main Jun 27, 2024
1 check passed
@biglittlebigben biglittlebigben deleted the benjamin/agents branch June 27, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants